Search CORE

245 research outputs found

PubChem3D: Shape compatibility filtering using molecular shape quadrupoles

Author: Bolton Evan E
Bryant Stephen H
Kim Sunghwan
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background PubChem provides a 3-D neighboring relationship, which involves finding the maximal shape overlap between two static compound 3-D conformations, a computationally intensive step. It is highly desirable to avoid this overlap computation, especially if it can be determined with certainty that a conformer pair cannot meet the criteria to be a 3-D neighbor. As such, PubChem employs a series of pre-filters, based on the concept of volume, to remove approximately 65% of all conformer neighbor pairs prior to shape overlap optimization. Given that molecular volume, a somewhat vague concept, is rather effective, it leads one to wonder: can the existing PubChem 3-D neighboring relationship, which consists of billions of shape similar conformer pairs from tens of millions of unique small molecules, be used to identify additional shape descriptor relationships? Or, put more specifically, can one place an upper bound on shape similarity using other "fuzzy" shape-like concepts like length, width, and height? Results Using a basis set of 4.18 billion 3-D neighbor pairs identified from single conformer per compound neighboring of 17.1 million molecules, shape descriptors were computed for all conformers. These steric shape descriptors included several forms of molecular volume and shape quadrupoles, which essentially embody the length, width, and height of a conformer. For a given 3-D neighbor conformer pair, the volume and each quadrupole component (Qx, Qy, and Qz) were binned and their frequency of occurrence was examined. Per molecular volume type, this effectively produced three different maps, one per quadrupole component (Qx, Qy, and Qz), of allowed values for the similarity metric, shape Tanimoto (ST) ≥ 0.8. The efficiency of these relationships (in terms of true positive, true negative, false positive and false negative) as a function of ST threshold was determined in a test run of 13.2 billion conformer pairs not previously considered by the 3-D neighbor set. At an ST ≥ 0.8, a filtering efficiency of 40.4% of true negatives was achieved with only 32 false negatives out of 24 million true positives, when applying the separate Qx, Qy, and Qz maps in a series (Qxyz). This efficiency increased linearly as a function of ST threshold in the range 0.8-0.99. The Qx filter was consistently the most efficient followed by Qy and then by Qz. Use of a monopole volume showed the best overall performance, followed by the self-overlap volume and then by the analytic volume. Application of the monopole-based Qxyz filter in a "real world" test of 3-D neighboring of 4,218 chemicals of biomedical interest against 26.1 million molecules in PubChem reduced the total CPU cost of neighboring by between 24-38% and, if used as the initial filter, removed from consideration 48.3% of all conformer pairs at almost negligible computational overhead. Conclusion Basic shape descriptors, such as those embodied by size, length, width, and height, can be highly effective in identifying shape incompatible compound conformer pairs. When performing a 3-D search using a shape similarity cut-off, computation can be avoided by identifying conformer pairs that cannot meet the result criteria. Applying this methodology as a filter for PubChem 3-D neighboring computation, an improvement of 31% was realized, increasing the average conformer pair throughput from 154,000 to 202,000 per second per CPU core.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The PubChem chemical structure sketcher

Author: D Weininger
EE Bolton
Evan E Bolton
P Ertl
S Krause
Stephen H Bryant
WD Ihlenfeldt
Wolf D Ihlenfeldt
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

PubChem is an important public, Web-based information source for chemical and bioactivity information. In order to provide convenient structure search methods on compounds stored in this database, one mandatory component is a Web-based drawing tool for interactive sketching of chemical query structures. Web-enabled chemical structure sketchers are not new, being in existence for years; however, solutions available rely on complex technology like Java applets or platform-dependent plug-ins. Due to general policy and support incident rate considerations, Java-based or platform-specific sketchers cannot be deployed as a part of public NCBI Web services. Our solution: a chemical structure sketching tool based exclusively on CGI server processing, client-side JavaScript functions, and image sequence streaming. The PubChem structure editor does not require the presence of any specific runtime support libraries or browser configurations on the client. It is completely platform-independent and verified to work on all major Web browsers, including older ones without support for Web2.0 JavaScript objects

Crossref

Springer - Publisher Connector

PubMed Central

PubChem3D: Diversity of shape

Author: A Nicholls
E Yuriev
EE Bolton
EE Bolton
Evan E Bolton
EW Sayers
F Fontaine
G Schneider
GB McGaughey
J Kirchmair
JA Grant
JA Grant
JA Grant
JA Haigh
PCD Hawkins
RP Sheridan
Stephen H Bryant
Sunghwan Kim
V Venkatraman
YL Wang
YL Wang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The shape diversity of 16.4 million biologically relevant molecules from the PubChem Compound database and their 1.46 billion diverse conformers was explored as a function of molecular volume. Results The diversity of shape space was investigated by determining the shape similarity threshold to achieve a maximum on the count of reference shapes per unit of conformer volume. The rate of growth in shape space, as represented by a decreasing shape similarity threshold, was found to be remarkably smooth as a function of volume. There was no apparent correlation between the count of conformers per unit volume and their diversity, meaning that a single reference shape can describe the shape space of many chemical structures. The ability of a volume to describe the shape space of lesser volumes was also examined. It was shown that a given volume was able to describe 40-70% of the shape diversity of lesser volumes, for the majority of the volume range considered in this study. Conclusion The relative growth of shape diversity as a function of volume and shape similarity is surprisingly uniform. Given the distribution of chemicals in PubChem versus what is theoretically synthetically possible, the results from this analysis should be considered a conservative estimate to the true diversity of shape space.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open and FAIR transformation product data for improved suspect/non-target screening: REFTPs in the NORMAN-SLE, PubChem and patRoon

Author: Bolton Evan E
CHIRSIR Parviel
Helmus Rick
SCHYMANSKI Emma
Thiessen Paul A
Zhang Jian
Publication venue
Publication date: 13/06/2023
Field of study

peer reviewedPresentation given at ICCE, Venice, 11 - 15 June 2023

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Open Repository and Bibliography - Luxembourg

Per- and Polyfluoroalkyl Substances (PFAS) in PubChem: 7 Million and Growing.

Author: Bolton Evan E
CHIRSIR Parviel
KONDIC Todor
SCHYMANSKI Emma
Thiessen Paul A
Zhang Jian
Publication venue: American Chemical Society (ACS)
Publication date: 07/11/2023
Field of study

peer reviewedPer- and polyfluoroalkyl substances (PFAS) are of high concern, with calls to regulate them as a class. In 2021, the Organisation for Economic Co-operation and Development (OECD) revised the definition of PFAS to include any chemical containing at least one saturated CF2 or CF3 moiety. The consequence is that one of the largest open chemical collections, PubChem, with 116 million compounds, now contains over 7 million PFAS under this revised definition. These numbers are several orders of magnitude higher than previously established PFAS lists (typically thousands of entries) and pose an incredible challenge to researchers and computational workflows alike. This article describes a dynamic, openly accessible effort to navigate and explore the >7 million PFAS and >21 million fluorinated compounds (September 2023) in PubChem by establishing the "PFAS and Fluorinated Compounds in PubChem" Classification Browser (or "PubChem PFAS Tree"). A total of 36500 nodes support browsing of the content according to several categories, including classification, structural properties, regulatory status, or presence in existing PFAS suspect lists. Additional annotation and associated data can be used to create subsets (and thus manageable suspect lists or databases) of interest for a wide range of environmental, regulatory, exposomics, and other applications

Open Repository and Bibliography - Luxembourg

PubChem3D: Similar conformers

Author: A Bondi
A Nicholls
C Knox
CH Cashin
DC McLeod
EE Bolton
EE Bolton
EE Bolton
EL Tolman
Evan E Bolton
EW Sayers
F Fontaine
IS Haque
JA Grant
JA Grant
JA Haigh
JD Holliday
JD Holliday
JEJ Mills
SS Kerwar
Stephen H Bryant
Sunghwan Kim
SW Muchmore
TS Rush
X Chen
Y Wang
YL Wang
YL Wang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background PubChem is a free and open public resource for the biological activities of small molecules. With many tens of millions of both chemical structures and biological test results, PubChem is a sizeable system with an uneven degree of available information. Some chemical structures in PubChem include a great deal of biological annotation, while others have little to none. To help users, PubChem pre-computes "neighboring" relationships to relate similar chemical structures, which may have similar biological function. In this work, we introduce a "Similar Conformers" neighboring relationship to identify compounds with similar 3-D shape and similar 3-D orientation of functional groups typically used to define pharmacophore features. Results The first two diverse 3-D conformers of 26.1 million PubChem Compound records were compared to each other, using a shape Tanimoto (ST) of 0.8 or greater and a color Tanimoto (CT) of 0.5 or greater, yielding 8.16 billion conformer neighbor pairs and 6.62 billion compound neighbor pairs, with an average of 253 "Similar Conformers" compound neighbors per compound. Comparing the 3-D neighboring relationship to the corresponding 2-D neighboring relationship ("Similar Compounds") for molecules such as caffeine, aspirin, and morphine, one finds unique sets of related chemical structures, providing additional significant biological annotation. The PubChem 3-D neighboring relationship is also shown to be able to group a set of non-steroidal anti-inflammatory drugs (NSAIDs), despite limited PubChem 2-D similarity. In a study of 4,218 chemical structures of biomedical interest, consisting of many known drugs, using more diverse conformers per compound results in more 3-D compound neighbors per compound; however, the overlap of the compound neighbor lists per conformer also increasingly resemble each other, being 38% identical at three conformers and 68% at ten conformers. Perhaps surprising is that the average count of conformer neighbors per conformer increases rather slowly as a function of diverse conformers considered, with only a 70% increase for a ten times growth in conformers per compound (a 68-fold increase in the conformer pairs considered). Neighboring 3-D conformers on the scale performed, if implemented naively, is an intractable problem using a modest sized compute cluster. Methodology developed in this work relies on a series of filters to prevent performing 3-D superposition optimization, when it can be determined that two conformers cannot possibly be a neighbor. Most filters are based on Tanimoto equation volume constraints, avoiding incompatible conformers; however, others consider preliminary superposition between conformers using reference shapes. Conclusion The "Similar Conformers" 3-D neighboring relationship locates similar small molecules of biological interest that may go unnoticed when using traditional 2-D chemical structure graph-based methods, making it complementary to such methodologies. The computational cost of 3-D similarity methodology on a wide scale, such as PubChem contents, is a considerable issue to overcome. Using a series of efficient filters, an effective throughput rate of more than 150,000 conformers per second per processor core was achieved, more than two orders of magnitude faster than without filtering.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HESI UVCB Meeting - Integrating UVCBs and Related Data into Open Chemical Knowledgebases - PubChem and NORMAN-SLE

Author: Bolton Evan E.
ELAPAVALORE Anjana
Li Qingliang
SCHYMANSKI Emma
Thiessen Paul A.
Zaslavsky Leonid
Zhang Jian
Publication venue
Publication date: 18/09/2023
Field of study

Presentation and poster (given remotely) for the HESI UVCB Meeting in Iceland, 18-19 September, 2023

Open Repository and Bibliography - Luxembourg

Automated annotation of chemical names in the literature with tunable accuracy

Author: A Copestake
AR Aronson
AR Aronson
C Kolarik
C Kolarik
CE Lipscomb
DL Banville
DM Jassop
E Bolton
Evan E Bolton
GG Chowdhury
GG Chowdhury
JD Wren
Jun D Zhang
KM Hettne
KM Hettne
Lewis Y Geer
P Corbett
PT Corbett
R Klinger
Stephen H Bryant
WJ Wilbur
YY Zhou
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background A significant portion of the biomedical and chemical literature refers to small molecules. The accurate identification and annotation of compound name that are relevant to the topic of the given literature can establish links between scientific publications and various chemical and life science databases. Manual annotation is the preferred method for these works because well-trained indexers can understand the paper topics as well as recognize key terms. However, considering the hundreds of thousands of new papers published annually, an automatic annotation system with high precision and relevance can be a useful complement to manual annotation. Results An automated chemical name annotation system, MeSH Automated Annotations (MAA), was developed to annotate small molecule names in scientific abstracts with tunable accuracy. This system aims to reproduce the MeSH term annotations on biomedical and chemical literature that would be created by indexers. When comparing automated free text matching to those indexed manually of 26 thousand MEDLINE abstracts, more than 40% of the annotations were false-positive (FP) cases. To reduce the FP rate, MAA incorporated several filters to remove "incorrect" annotations caused by nonspecific, partial, and low relevance chemical names. In part, relevance was measured by the position of the chemical name in the text. Tunable accuracy was obtained by adding or restricting the sections of the text scanned for chemical names. The best precision obtained was 96% with a 28% recall rate. The best performance of MAA, as measured with the F statistic was 66%, which favorably compares to other chemical name annotation systems. Conclusions Accurate chemical name annotation can help researchers not only identify important chemical names in abstracts, but also match unindexed and unstructured abstracts to chemical records. The current work is tested against MEDLINE, but the algorithm is not specific to this corpus and it is possible that the algorithm can be applied to papers from chemical physics, material, polymer and environmental science, as well as patents, biological assay descriptions and other textual data.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

PubChem3D: Biologically relevant 3-D similarity

Author: A Nicholls
AM Wassermann
BS Edwards
CJ Echeverri
CN Rupasinghe
D Dimova
DJ Diller
EE Bolton
EE Bolton
EE Bolton
EE Bolton
Evan E Bolton
EW Sayers
F Bajorath
GB McGaughey
GM Maggiora
J Dunlop
J Inglese
JA Grant
JL Medina-Franco
JP Goddard
N LeDonne
N Malo
OH Aina
P Chen
P Willett
PT Corbett
RE White
RP Hertzberg
S Kim
S Pettersson
SA Sundberg
Stephen H Bryant
Sunghwan Kim
TS Rush
WH Moos
YC Martin
YL Wang
YL Wang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

PubChem3D: a new resource for scientists

Author: A Nicholls
AD Andricopulo
B Musafia
Bo Yu
D Hull
EE Bolton
EE Bolton
EE Bolton
EE Bolton
Evan E Bolton
EW Sayers
F Fontaine
H Sun
ID Kuntz
J Bostrom
J Sadowski
JA Grant
JA Grant
JA Grant
JEJ Mills
Jian Zhang
Jie Chen
Jiyao Wang
JM Barnard
KJ Simmons
Lianyi Han
ML Mansfield
MS Lajiness
NA Meanwell
Paul A Thiessen
PCD Hawkins
RLM van Montfort
S Kim
S Kim
Siqian He
Stephen H Bryant
Sunghwan Kim
TA Halgren
TA Halgren
TA Halgren
V Mohan
Vahan Simonyan
Wenyao Shi
X Chan
Yan Sun
YL Wang
YL Wang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background PubChem is an open repository for small molecules and their experimental biological activity. PubChem integrates and provides search, retrieval, visualization, analysis, and programmatic access tools in an effort to maximize the utility of contributed information. There are many diverse chemical structures with similar biological efficacies against targets available in PubChem that are difficult to interrelate using traditional 2-D similarity methods. A new layer called PubChem3D is added to PubChem to assist in this analysis. Description PubChem generates a 3-D conformer model description for 92.3% of all records in the PubChem Compound database (when considering the parent compound of salts). Each of these conformer models is sampled to remove redundancy, guaranteeing a minimum (non-hydrogen atom pair-wise) RMSD between conformers. A diverse conformer ordering gives a maximal description of the conformational diversity of a molecule when only a subset of available conformers is used. A pre-computed search per compound record gives immediate access to a set of 3-D similar compounds (called "Similar Conformers") in PubChem and their respective superpositions. Systematic augmentation of PubChem resources to include a 3-D layer provides users with new capabilities to search, subset, visualize, analyze, and download data. A series of retrospective studies help to demonstrate important connections between chemical structures and their biological function that are not obvious using 2-D similarity but are readily apparent by 3-D similarity. Conclusions The addition of PubChem3D to the existing contents of PubChem is a considerable achievement, given the scope, scale, and the fact that the resource is publicly accessible and free. With the ability to uncover latent structure-activity relationships of chemical structures, while complementing 2-D similarity analysis approaches, PubChem3D represents a new resource for scientists to exploit when exploring the biological annotations in PubChem.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central